<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.29
     from lex.tnf on 19 December 2010 -->

<TITLE>Lexical Analysis - Table of Contents</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000" BACKGROUND="gifs/bg.gif">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0" VALIGN=BOTTOM>
<TR VALIGN=BOTTOM>
<TD WIDTH="160" VALIGN=BOTTOM>
<A HREF="http://eli-project.sourceforge.net/">
<IMG SRC="gifs/elilogo.gif" BORDER=0>
</A>&nbsp;
</TD>
<TD WIDTH="25" VALIGN=BOTTOM>
<img src="gifs/empty.gif" WIDTH=25 HEIGHT=25>
</TD>
<TD ALIGN=LEFT WIDTH="475" VALIGN=BOTTOM>
<A HREF="index.html"><IMG SRC="gifs/title.png" BORDER=0></A>
</TD>
<!-- |DELETE FOR SOURCEFORGE LOGO|
<TD>
<a href="http://sourceforge.net/projects/eli-project">
<img
  src="http://sflogo.sourceforge.net/sflogo.php?group_id=70447&amp;type=13"
  width="120" height="30"
  alt="Get Eli: Translator Construction Made Easy at SourceForge.net.
    Fast, secure and Free Open Source software downloads"/>
</a>
</TD>
|DELETE FOR SOURCEFORGE LOGO| -->
</TR>
</TABLE>

<HR size=1 noshade width=785 align=left>
<TABLE BORDER=0 CELLSPACING=2 CELLPADDING=0>
<TR>
<TD VALIGN=TOP WIDTH="160">
<h4>General Information</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="index.html">Eli: Translator Construction Made Easy</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gindex_1.html#SEC1">Global Index</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="faq_toc.html" >Frequently Asked Questions</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ee.html" >Typical Eli Usage Errors</a> </td></tr>
</table>

<h4>Tutorials</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="EliRefCard_toc.html">Quick Reference Card</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="novice_toc.html">Guide For new Eli Users</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="news_toc.html">Release Notes of Eli</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="nametutorial_toc.html">Tutorial on Name Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="typetutorial_toc.html">Tutorial on Type Analysis</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ee.html" >Typical Eli Usage Errors</a> </td></tr>
</table>

<h4>Reference Manuals</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ui_toc.html">User Interface</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="pp_toc.html">Eli products and parameters</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lidoref_toc.html">LIDO Reference Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ee.html" >Typical Eli Usage Errors</a> </td></tr>
</table>

<h4>Libraries</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lib_toc.html">Eli library routines</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="modlib_toc.html">Specification Module Library</a></td></tr>
</table>

<h4>Translation Tasks</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lex_toc.html">Lexical analysis specification</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="syntax_toc.html">Syntactic Analysis Manual</a></td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="comptrees_toc.html">Computation in Trees</a></td></tr>
</table>

<h4>Tools</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="lcl_toc.html">LIGA Control Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="show_toc.html">Debugging Information for LIDO</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="gorto_toc.html">Graphical ORder TOol</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="fw_toc.html">FunnelWeb User's Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="ptg_toc.html">Pattern-based Text Generator</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="deftbl_toc.html">Property Definition Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="oil_toc.html">Operator Identification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="tp_toc.html">Tree Grammar Specification Language</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="clp_toc.html">Command Line Processing</a> </td></tr>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="cola_toc.html">COLA Options Reference Manual</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="idem_toc.html">Generating Unparsing Code</a> </td></tr>
</table>
<p>
<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="mon_toc.html">Monitoring a Processor's Execution</a> </td></tr>
</table>

<h4>Administration</h4>

<table BORDER=0 CELLSPACING=0 CELLPADDING=0>
<tr valign=top><td><img src="gifs/gelbekugel.gif" WIDTH=7 HEIGHT=7 ALT=" o"> </td><td><a href="sysadmin_toc.html">System Administration Guide</a> </td></tr>
</table>

<HR WIDTH="100%">
<A HREF="mailto:eli-project-users@lists.sourceforge.net">
<IMG SRC="gifs/button_mail.gif" BORDER=0 ALIGN="left"></A>
<A HREF="index.html"><IMG SRC="gifs/home.gif" BORDER=0 ALIGN="right"></A>

</TD>
<TD VALIGN=TOP WIDTH="25"><img src="gifs/empty.gif" WIDTH=25 HEIGHT=25></TD>

<TD VALIGN=TOP WIDTH="600">
<A HREF="lex.pdf"><IMG SRC="gifs/pdficon_large.gif" ALT="Open PDF File" BORDER="0" ALIGN=RIGHT></A>
<H1>Lexical Analysis</H1>
<P>
The purpose of the lexical analyzer is to partition the input text,
delivering a sequence of <DFN>comments</DFN> and <DFN>basic symbols</DFN>.
Comments are character sequences to be ignored, while basic symbols are
character sequences that correspond to terminal symbols of the grammar
defining the phrase structure of the input
(see  <A HREF="syntax_1.html#SEC1">Context-Free Grammars and Parsing of Syntactic Analysis</A>).
<P>
A user must define the forms of comments and the forms of all basic symbols
corresponding to non-literal terminal symbols of the grammar.
Eli can deduce the form of a literal terminal symbol from the grammar
specification.
<P>
The definition consists of one or more type-<TT>`gla'</TT> files.
Each line of a type-<TT>`gla'</TT> file describes a set of character sequences.
If a line begins with an identifier followed by a colon (<KBD>:</KBD>), then all
of the character sequences described by the line are instances of
the non-literal terminal symbol named by that identifier;
otherwise they are comments.
<P>
Here is an example of a type-<TT>`gla'</TT> file:
<P>
<PRE>
HexInteger:  $0[Xx][0-9A-Fa-f]+
             $!  (auxEOL)
Identifier:  C_IDENTIFIER
</PRE>
<P>
The first line of this specification uses a regular expression to define a
hexadecimal integer as a zero, followed by the letter <CODE>X</CODE> (either
upper or lower case) and one or more hexadecimal digits represented in the
usual way.
In the second line, one form of comment is defined by a regular expression
and the name of a C routine.
The C routine will be invoked when the regular expression has been matched.
This approach allows the user to define character sequences operationally
when a declarative definition is tedious or does not support appropriate
error reporting.
<P>
Since certain lexical structures are common to many languages, Eli provides
a library of definitions that can be invoked simply be giving their names.
<CODE>C_IDENTIFIER</CODE>, in the third line, is such an invocation.
The effect of the third line is to define the form of the basic symbol
<CODE>Identifier</CODE> as that of an identifier in C: a letter or underscore
followed by some sequence of letters, digits and underscores.
<P>
Chapter 1 defines the usage, form and content of specifications provided by
the user as type-<TT>`gla'</TT> files.
Those specifications may refer to canned descriptions, which are defined in
Chapter 2.
Chapter 3 presents the default processing of spaces, tabs and newlines and
explains how to define other strategies.
The treatment and meaning of literal terminal symbols is discussed in
Chapter 4, and Chapter 5 explains how a generated lexical analyzer can be
made insensitive to the case of letters.
Complex lexical analysis problems may require modification of the behavior
of the generated module; Chapter 6 discusses the possibilities.
<P>
<P>
<UL>
<LI><A NAME="SEC1" HREF="lex_1.html#SEC1">Specifications</A>
<UL>
<LI><A NAME="SEC2" HREF="lex_1.html#SEC2">Regular Expressions</A>
<UL>
<LI><A NAME="SEC3" HREF="lex_1.html#SEC3">Matching operator characters</A>
<LI><A NAME="SEC4" HREF="lex_1.html#SEC4">Character classes</A>
<LI><A NAME="SEC5" HREF="lex_1.html#SEC5">Building complex regular expressions</A>
<LI><A NAME="SEC6" HREF="lex_1.html#SEC6">What happens if the specification is ambiguous?</A>
</UL>
<LI><A NAME="SEC7" HREF="lex_1.html#SEC7">Auxiliary Scanners</A>
<UL>
<LI><A NAME="SEC8" HREF="lex_1.html#SEC8">Available scanners</A>
<LI><A NAME="SEC9" HREF="lex_1.html#SEC9">Building scanners</A>
</UL>
<LI><A NAME="SEC10" HREF="lex_1.html#SEC10">Token Processors</A>
<UL>
<LI><A NAME="SEC11" HREF="lex_1.html#SEC11">Available processors</A>
<LI><A NAME="SEC12" HREF="lex_1.html#SEC12">Building processors</A>
</UL>
</UL>
<LI><A NAME="SEC13" HREF="lex_2.html#SEC13">Canned Symbol Descriptions</A>
<UL>
<LI><A NAME="SEC14" HREF="lex_2.html#SEC14">Available Descriptions</A>
<LI><A NAME="SEC15" HREF="lex_2.html#SEC15">Definitions of Canned Descriptions</A>
</UL>
<LI><A NAME="SEC16" HREF="lex_3.html#SEC16">Spaces, Tabs and Newlines</A>
<UL>
<LI><A NAME="SEC17" HREF="lex_3.html#SEC17">Maintaining the Source Text Coordinates</A>
<LI><A NAME="SEC18" HREF="lex_3.html#SEC18">Restoring the Default Behavior for White Space</A>
<LI><A NAME="SEC19" HREF="lex_3.html#SEC19">Making White Space Illegal</A>
</UL>
<LI><A NAME="SEC20" HREF="lex_4.html#SEC20">Literal Symbols</A>
<UL>
<LI><A NAME="SEC21" HREF="lex_4.html#SEC21">Overriding the Default Treatment of Literal Symbols</A>
<LI><A NAME="SEC22" HREF="lex_4.html#SEC22">Using Literal Symbols to Represent Other Things</A>
</UL>
<LI><A NAME="SEC23" HREF="lex_5.html#SEC23">Case Insensitivity</A>
<UL>
<LI><A NAME="SEC24" HREF="lex_5.html#SEC24">A Case-Insensitive Token Processor</A>
<LI><A NAME="SEC25" HREF="lex_5.html#SEC25">Making Literal Symbols Case Insensitive</A>
</UL>
<LI><A NAME="SEC26" HREF="lex_6.html#SEC26">The Generated Lexical Analyzer Module</A>
<UL>
<LI><A NAME="SEC27" HREF="lex_6.html#SEC27">Interaction Between the Lexical Analyzer and the Text</A>
<LI><A NAME="SEC28" HREF="lex_6.html#SEC28">Resetting the Scan Pointer</A>
<LI><A NAME="SEC29" HREF="lex_6.html#SEC29">The Classification Operation</A>
<UL>
<LI><A NAME="SEC30" HREF="lex_6.html#SEC30">Setting coordinate values</A>
<LI><A NAME="SEC31" HREF="lex_6.html#SEC31">Deciding on a continuation after a classification</A>
<LI><A NAME="SEC32" HREF="lex_6.html#SEC32">Returning a classification</A>
</UL>
<LI><A NAME="SEC33" HREF="lex_6.html#SEC33">An Example of Interface Usage</A>
</UL>
<LI><A NAME="SEC34" HREF="lex_7.html#SEC34">Index</A>
</UL>
<HR size=1 noshade width=600 align=left>
</TD>
</TR>
</TABLE>

</BODY></HTML>
